A Data Type-Driven Property Alignment Framework for Product Duplicate Detection on the Web
نویسندگان
چکیده
During the last decade daily life has morphed into a world of broadband ubiquity, where devices facilitate constant engagement. As a consequence of this, the area of e-commerce has seen an immense growth. Despite the market opportunities for retailers and the ease for customers to acquire products through webshops, the shift to digital retail has its drawbacks. For example, it leads to cluttered and incomparable information among different webshops, which calls for an automated method to regain homogeneity in product representations. This paper presents a product duplicate detection solution, which exploits a data typedriven property alignment framework. Based on the performed experiment, we show a statistically significant improvement of the F1-score from 47.91% to 78.13% compared to an existing state-of-the-art approach.
منابع مشابه
A Hybrid Model Words-Driven Approach for Web Product Duplicate Detection
The detection of product duplicates is one of the challenges that Web shop aggregators are currently facing. In this paper, we focus on solving the problem of product duplicate detection on the Web. Our proposed method extends a state-of-the-art solution that uses the model words in product titles to find duplicate products. First, we employ the aforementioned algorithm in order to find matchin...
متن کاملThe Effect of Property Rights on Entrepreneurship:Evidence from Some Factor-driven, Efficiency-driven, and Innovation-driven Countries
Entrepreneurship is influenced by many factors and environments such as institutions. Institutions have an important role to play in the individual's tendency toward necessity and opportunity entrepreneurship. The purpose of this paper was to examine the impact of institutional quality (property rights) on opportunity and necessity entrepreneurship. The results, based on unbalanced panel data f...
متن کاملA New Method for Duplicate Detection Using Hierarchical Clustering of Records
Accuracy and validity of data are prerequisites of appropriate operations of any software system. Always there is possibility of occurring errors in data due to human and system faults. One of these errors is existence of duplicate records in data sources. Duplicate records refer to the same real world entity. There must be one of them in a data source, but for some reasons like aggregation of ...
متن کاملIdentification of BKCa channel openers by molecular field alignment and patent data-driven analysis
In this work, we present the first comprehensive molecular field analysis of patent structures on how the chemical structure of drugs impacts the biological binding. This task was formulated as searching for drug structures to reveal shared effects of substitutions across a common scaffold and the chemical features that may be responsible. We used the SureChEMBL patent database, which prov...
متن کاملLived experience Consumers in online stores based on the Stimulator-Organism-Response Framework (SOR)
In this study, based on the stimulus-organism-response framework (SOR), to develop a comprehensive framework of consumer experience in the field of online retailers, examining the impact of online store environment elements (web quality and brand Web site) as forecasting for emotional responses and cognitive (trust and perceived risk) and behavioral responses of consumers (want to buy) are disc...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016